36 research outputs found

    Phylogenetic inference under varying proportions of indel-induced alignment gaps

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods.</p> <p>Results</p> <p>(1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "ML<it>ε</it>, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the ML<it>ε </it>method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps.</p> <p>Conclusion</p> <p>When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy.</p

    ETV2/ER71 regulates the generation of FLK1+ cells from mouse embryonic stem cells through miR-126-MAPK signaling

    Get PDF
    Previous studies including ours have demonstrated a critical function of the transcription factor ETV2 (ets variant 2; also known as ER71) in determining the fate of cardiovascular lineage development. However, the underlying mechanisms of ETV2 function remain largely unknown. In this study, we demonstrated the novel function of the miR (micro RNA)-126-MAPK (mitogen-activated protein kinase) pathway in ETV2-mediated FLK1 (fetal liver kinase 1; also known as VEGFR2)+ cell generation from the mouse embryonic stem cells (mESCs). By performing a series of experiments including miRNA sequencing and ChIP (chromatin immunoprecipitation)-PCR, we found that miR-126 is directly induced by ETV2. Further, we identified that miR-126 can positively regulate the generation of FLK1+ cells by activating the MAPK pathway through targeting SPRED1 (sprouty-related EVH1 domain containing 1). Further, we showed evidence that JUN/FOS activate the enhancer region of FLK1 through AP1 (activator protein 1) binding sequences. Our findings provide insight into the novel molecular mechanisms of ETV2 function in regulating cardiovascular lineage development from mESCs

    PhiSiGns: an online tool to identify signature genes in phages and design PCR primers for examining phage diversity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phages (viruses that infect bacteria) have gained significant attention because of their abundance, diversity and important ecological roles. However, the lack of a universal gene shared by all phages presents a challenge for phage identification and characterization, especially in environmental samples where it is difficult to culture phage-host systems. Homologous conserved genes (or "signature genes") present in groups of closely-related phages can be used to explore phage diversity and define evolutionary relationships amongst these phages. Bioinformatic approaches are needed to identify candidate signature genes and design PCR primers to amplify those genes from environmental samples; however, there is currently no existing computational tool that biologists can use for this purpose.</p> <p>Results</p> <p>Here we present PhiSiGns, a web-based and standalone application that performs a pairwise comparison of each gene present in user-selected phage genomes, identifies signature genes, generates alignments of these genes, and designs potential PCR primer pairs. PhiSiGns is available at (<url>http://www.phantome.org/phisigns/</url>; <url>http://phisigns.sourceforge.net/</url>) with a link to the source code. Here we describe the specifications of PhiSiGns and demonstrate its application with a case study.</p> <p>Conclusions</p> <p>PhiSiGns provides phage biologists with a user-friendly tool to identify signature genes and design PCR primers to amplify related genes from uncultured phages in environmental samples. This bioinformatics tool will facilitate the development of novel signature genes for use as molecular markers in studies of phage diversity, phylogeny, and evolution.</p

    Impact of molecular evolutionary footprints on phylogenetic accuracy a simulation study

    No full text
    An accurately inferred phylogeny is important to the study of molecular evolution. Factors impacting the accuracy of a phylogenetic tree can be traced to several consecutive steps leading to the inference of the phylogeny. In this simulation-based study our focus is on the impact of the certain evolutionary features of the nucleotide sequences themselves in the alignment rather than any source of error during the process of sequence alignment or due to the choice of the method of phylogenetic inference. Nucleotide sequences can be characterized by summary statistics such as sequence length and base composition. When two or more such sequences need to be compared to each other (as in an alignment prior to phylogenetic analysis) additional evolutionary features come into play, such as the overall rate of nucleotide substitution, the ratio of two specific instantaneous, rates of substitution (rate at which transitions and transversions occur), and the shape parameter, of the gamma distribution (that quantifies the extent of heterogeneity in substitution rate among sites in an alignment). We studied the implications of the following five sequence parameters, individually and in combination: sequence length, substitution rate, nucleotide base composition, the transition-transversion rate ratio and the rate heterogeneity among the sites. It is found that the transition-transversion rate ratio or kappa has a significant impact on phylogenetic accuracy, with a strong positive interaction with accuracy at high substitution rates, contrary to general belief. This work on known expected tree has implications for the researcher in field and would enable them to choose from among the multiple genes typically available today for an accurate phylogenetic inference. DNA sequences diverge from their ancestral sequences by means of evolutionary events (other than mentioned above) such as deletion (deletion of one more nucleotide from a sequence) or insertion (insertion of one more nucleotide to a sequence) events, commonly referreed to as gaps in a sequence alignment. We have also investigated the relationship between the number of gaps and phylogenetic accuracy, when the gaps are introduced in an alignment to reflect indel (insertion/deletion) events during the evolution of DNA sequences. DNA sequence alignments were generated using computer simulation, while varying several sequence parameters and introducing both substitution and insertion/deletion events, along a 16-taxon model tree, and systematically varying the expected proportion of gapped sites. The resulting alignments were subjected to commonly used gap treatment methods and methods of phylogenetic inference. The results showed that in general, there is a strong almost deterministic relationship between the amount of gap in the data and the level of phylogenetic accuracy, when the amount of gap was high. Our results also suggest that, as long as the gaps in the alignment are a consequence of indel events in the evolutionary history of the sequences, the accuracy of phylogenetic analysis is likely to improve if alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis and if the phylogenetic signal provided by indels is harnessed, for example, by treating the gaps as binary characters in Bayesian or Maximum Parsimony analyses, or in an integrated manner along with substitution events

    IMPACT OF MOLECULAR EVOLUTIONARY FOOTPRINTS ON PHYLOGENETIC ACCURACY – A SIMULATION STUDY

    No full text
    An accurately inferred phylogeny is important to the study of molecular evolution. Factors impacting the accuracy of a phylogenetic tree can be traced to several consecutive steps leading to the inference of the phylogeny. In this simulation-based study our focus is on the impact of the certain evolutionary features of the nucleotide sequences themselves in the alignment rather than any source of error during the process of sequence alignment or due to the choice of the method of phylogenetic inference. Nucleotide sequences can be characterized by summary statistics such as sequence length and base composition. When two or more such sequences need to be compared to each other (as in an alignment prior to phylogenetic analysis) additional evolutionary features come into play, such as the overall rate of nucleotide substitution, the ratio of two specific instantaneous, rates of substitution (rate at which transitions and transversions occur), and the shape parameter, of the gamma distribution (that quantifies the extent of heterogeneity in substitution rate among sites in an alignment). We studied the implications of the following five sequence parameters, individually and in combination: sequence length, substitution rate, nucleotide base composition, the transition-transversion rate ratio and the rate heterogeneity among the sites. It is found that the transition-transversion rate ratio or kappa has a significant impact on phylogenetic accuracy, with a strong positive interaction with accuracy at high substitution rates, contrary to general belief. This work on known expected tree has implications for the researcher in field and would enable them to choose from among the multiple genes typically available today for an accurate phylogenetic inference. DNA sequences diverge from their ancestral sequences by means of evolutionary events (other than mentioned above) such as deletion (deletion of one more nucleotide from a sequence) or insertion (insertion of one more nucleotide to a sequence) events, commonly referreed to as gaps in a sequence alignment. We have also investigated the relationship between the number of gaps and phylogenetic accuracy, when the gaps are introduced in an alignment to reflect indel (insertion/deletion) events during the evolution of DNA sequences. DNA sequence alignments were generated using computer simulation, while varying several sequence parameters and introducing both substitution and insertion/deletion events, along a 16-taxon model tree, and systematically varying the expected proportion of gapped sites. The resulting alignments were subjected to commonly used gap treatment methods and methods of phylogenetic inference. The results showed that in general, there is a strong almost deterministic relationship between the amount of gap in the data and the level of phylogenetic accuracy, when the amount of gap was high. Our results also suggest that, as long as the gaps in the alignment are a consequence of indel events in the evolutionary history of the sequences, the accuracy of phylogenetic analysis is likely to improve if alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis and if the phylogenetic signal provided by indels is harnessed, for example, by treating the gaps as binary characters in Bayesian or Maximum Parsimony analyses, or in an integrated manner along with substitution events

    shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

    No full text
    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/

    Sections of shinyGISPA showing gene sets profile support by “between sample differences” and “between feature differences” using the example data sets.

    No full text
    <p>Sections of shinyGISPA showing gene sets profile support by “between sample differences” and “between feature differences” using the example data sets.</p

    A screenshot of the shinyGISPA web user-interface for a two-feature analysis.

    No full text
    <p>A screenshot of the shinyGISPA web user-interface for a two-feature analysis.</p

    Results diagnostics plots snapshot of the shinyGISPA user-interface using the example data sets.

    No full text
    <p>Results diagnostics plots snapshot of the shinyGISPA user-interface using the example data sets.</p
    corecore